The effect of N-gram indexing on Arabic documents retrieval

نویسنده

  • Emad Fawzi Al-Shalabi
چکیده

This article presents a comparison between 3-gram and 4-gram term indexing in Arabic document retrieval. The calculation of similarity between query and documents is performed using single term and two term query, based on corpora of Arabic language documents collected from Arabic news websites available online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving KNN Arabic Text Classification with N-Grams Based Document Indexing

Text classification is the task of assigning a document to one or more of pre-defined categories based on its contents. This paper presents the results of classifying Arabic language documents by applying the KNN classifier, one time by using N-Gram namely unigrams and bigrams in documents indexing, and another time by using traditional single terms indexing method (bag of words) which supposes...

متن کامل

A Hybrid Method N-Grams-TFIDF with radial basis for indexing and classification of Arabic documents

In this paper, we propose a hybrid system for contextual and semantic indexing of Arabic documents, bringing an improvement to classical models based on n-grams and the TFIDF model. This new approach takes into account the concept of the semantic vicinity of terms. We proceed in fact by the calculation of similarity between words using an hybridization of NGRAMs-TFIDF statistical measures and a...

متن کامل

Study of Indexing Techniques to Improve the Performance of Information Retrieval in Telugu Language

Information Retrieval Systems (IRS) are so popular through World Wide Web. Availability of Text Information related to all types of objects like Documents, Web Pages, Images, Videos and Audio files on web are increasing day by day in an exponential manner. When the text repository grows to the maximum extent of the memory size in the server, the methods used to find a particular text unit eithe...

متن کامل

Uniform Indexing and Retrieval Scheme for Chinese, Japanese, and Korean

This paper reports on our work at the third NTCIR workshop on the subtasks of Chinese, Japanese, and Korean monolingual information retrieval (IR). A Chinese IR system is applied to all document sets in these three languages. Based on the n-gram indexing model and a phrase formulation method to extract longer key terms for indexing, no language-dependent modifications were made to apply the sys...

متن کامل

Swoogle: A Semantic Web Search and Metadata Engine

Swoogle is a crawler-based indexing and retrieval system for the Semantic Web, i.e., for Web documents in RDF or OWL. It extracts metadata for each discovered document, and computes relations between documents. Discovered documents are also indexed by an information retrieval system which can use either character N-Gram or URIrefs as keywords to find relevant documents and to compute the simila...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017